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Abstract — We study a class of codes for compressing memory- 
less Gaussian sources, designed using the statistical framework 
of high-dimensional linear regression. Codewords are linear 
combinations of subsets of columns of a design matrix. With 
maximum-likelihood encoding we show that such a codebook 
can attain the rate-distortion function with the optimal error- 
exponent, for all distortions below a specified value. The structure 
of the codebook is motivated by an analogous construction 
proposed recently by Barron and Joseph for communication over 
an AWGN channel. 

I. Introduction 

One of the important outstanding problems in informa- 
tion theory is the development of practical codes for lossy 
compression of general sources at rates approaching Shan- 
non's rate-distortion bound. In this paper, we study the 
compression of memoryless Gaussian sources using a class 
of codes constructed based on the statistical framework of 
high-dimensional linear regression. The codebook consists of 
codewords that are sparse linear combinations of columns of 
annxAf design matrix or 'dictionary', where n is the block- 
length and N is a low-order polynomial in n. Dubbed Sparse 
Superposition Codes or Sparse Regression Codes (SPARC), 
these codes are motivated by an analogous construction pro- 
posed recently by Barron and Joseph for communication over 
an AWGN channel JT1, |2). The structure of the codebook 
enables the design of computationally efficient encoders based 
on the rich theory on sparse linear regression. Here, we study 
the performance of these codes under maximum-likelihood 
(ML) encoding. The design of feasible encoders will be 
discussed in future work. 

Sparse regression codes for compressing Gaussian sources 
were first considered in [3] where some preliminary results 
were presented. In this paper, we analyze the performance 
of these codes under ML encoding and show that they can 
achieve the distortion-rate bound with the optimal error expo- 
nent for all rates above a specified value (approximately 1.15 
bits/sample). The proof uses Suen's inequality HI, a bound on 
the tail probability of a sum of dependent indicator random 
variables. This technique may be of independent interest and 
useful in other problems in information theory. 

We lay down some notation before proceeding further. 
Upper-case letters are used to denote random variables, lower- 
case for their realizations, and bold-face letters to denote 



random vectors and matrices. All vectors have length n. The 
source sequence is denoted by S = (Si, . . . , S„), and the re- 
construction sequence by S = (Si, . . . , S n ). ||X|| denotes the 
Z 2 -norm of vector X, and |X| = || X|| / -^/^ is the normalized 
version. We use natural logarithms, so entropy is measured 
in nats. f(x) = o(g(x)) means lim^^oo f(x)/g(x) = 0; 
f(x) — Q(g(x)) means f(x)/g(x) asymptotically lies in an 
interval [fci , k2\ for some constants k\ , &2 > 0. 

Consider an i.i.d Gaussian source S with mean and 
variance a 2 . A rate -distortion codebook with rate R and 
block length n is a set of e nR length-n codewords, denoted 
{S(l), .... ^(e"^)}. The quality of reconstruction is measured 
through a mean-squared error distortion criterion 

n 

d n (S,S) = |S-S| 2 = - V^-S,) 2 , 
n ^— ' 

where S is the codeword chosen to represent the source se- 
quence S. For this distortion criterion, an optimal (maximum- 
likelihood) encoder maps each source sequence to the code- 
word nearest to it in Euclidean distance. The rate-distortion 
function R*(D), the minimum rate for which the distortion 
can be bounded by D with high-probability, is given by [5] 

R*{D)= min I(S; S) = i log ^— nats/sample. 

p S]s :E(S-Sr<D 2 D 

(1) 

This rate can be achieved through Shannon-style random 
codebook selection: pick each codeword independently as an 
i.i.d Gaussian random vector distributed as Normal(0, a 2 — D). 

Lattice-based codes for Gaussian vector quantization have 
been widely studied, e.g |6), 17). There are computationally 
efficient quantizers for certain classes of lattice codes, but 
the high-dimensional lattices needed to approach the rate- 
distortion bound have exponential encoding complexity (7J. 
We also note that for sources with finite alphabet, various 
coding techniques have been proposed recently to approach the 
rate-distortion bound with computationally feasible encoding 
and decoding |8j-|[TTJ. 

II. Sparse Regression Codes 

A sparse regression code (SPARC) is defined in terms of a 
design matrix A of dimension n x ML whose entries are zero 
mean i.i.d Gaussian random variables with variance , — , 



A: 



Section 1 
M columns 



/3: o, 



Section 2 
M columns 



0,1,1 0,1,0, 



Section L 
M columns 



1,0, 



.0 



Fig. 1: A is an n x ML matrix and /3 is a AIL x 1 binary vector. 
The positions of the ones in (3 correspond to the gray columns of A 
which add to form the codeword A/3. 



Since each codeword in a SPARC is a sum of L columns 
of A (one from each section), codewords sharing one or more 
common columns in the sum will be dependent. Also, SPARCs 
are not linear codes since the sum of two codewords does not 
equal another codeword in general. 

III. Main Result 

We begin with some background on error exponents. 

The probability of error at distortion-level D of a rate- 
distortion code C n with block length n and encoder and 
decoder mappings g, h is 

P e (C n ,D) = P(\S-h(g(S))\ 2 >D). 



where the constant a 2 will be specified in the sequel. Here n 
is the block length and M and L are integers whose values will 
be specified shortly in terms of n and the rate R. As shown 
in Figure [T] one can think of the matrix A as composed of L 
sections with Al columns each. Each codeword is the sum of L 
columns, with one column from each section. More formally, a 
codeword can be expressed as A/3, where (3 is a binary-valued 
ML x 1 vector (/? l5 . . . , (3ml) with the following property: 
there is exactly one non-zero /3j for 1 < i < M, one non-zero 
Pi for M + 1 < i < 2M, and so forth. Denote the set of all 
/3's that satisfy this property by Bm,l- 

Maximum-likelihood Encoder. This is defined by a mapping 
g : R™ — > Bm,l- Given the source sequence S n , the 
encoder determines the (3 that produces the codeword closest 
in Euclidean distance, i.e., 



S(S) 



argmm 

j3£B M ,L 



A/3\\. 



Decoder. This is a mapping h : Bm,l — > R n - On receiving 
(3 € Bm,l from the encoder, the decoder produces reconstruc- 
tion h{fi) = A/3. 

Since there are M columns in each of the L sections, the 
total number of codewords is M L . To obtain a compression 
rate of R nats/sample, we therefore need 



M 



(2) 



There are several choices for the pair (M, L) which satisfy 
this. For example, L = 1 and M — e nR recovers the Shannon- 
style random codebook in which the number of columns 



in the dictionary A is e™ , i.e., exponential in n. For our 
constructions, we choose M = L b for some b > 1 so that |2]) 
implies 

LlogL = nR/b. (3) 



Thus L is now 9 ( r^— ) , and the number of columns ML 

\ log n J' 

( \ 6+1 

in the dictionary A is now 8 ( J , a polynomial in 
n. This reduction in dictionary complexity can be harnessed 
to develop computationally efficient encoders for the sparse 
regression code. We note that the code structure automatically 
yields low decoding complexity. 



Definition 1: The error exponent at distortion-level D of a 
sequence of rate R codes {C n } n =i,2,... is given by 



1 



r(R,D) = - lim sup- log P e (<:„,£>) 



(4) 



The optimal error exponent for a rate-distortion pair (R, D) 
is the supremum of the error exponents over all sequences of 
codes with rate R, at distortion-level D. 

The error-exponent describes the asymptotic behavior of the 
probability of error; bounds on the probability of error for 
finite block lengths were obtained in fl2) , p3) . The optimal 
error exponent was obtained by Marton |14) for discrete 
memoryless sources and was extended to Gaussian sources 
by Ihara and Kubo [15]. 

Fact 1: (15) For an i.i.d Gaussian source distributed as 
Normal(0, a 2 ) and mean-squared error distortion criterion, the 
optimal error exponent at rate R and distortion-level D is 



\D,R) 



I (el _ I 

2 
o 



R > R*(D) 
R < R*{D) 



where p 2 is determined by 



R 



1, P Z 

- log — . 

2 D 



(5) 



(6) 



For R > R*(D), the exponent in |5| is the Kullback-Leibler 
divergence between two zero-mean Gaussian distributions, 
the first with variance p 2 and the second with variance a 2 . 
|l5j shows that at rate R, we can compress all sequences 
which have empirical variance less than p 2 to within distortion 
D with double-exponentially decaying probability of error. 
Consequently, the dominant error event is obtaining a source 
sequence with empirical variance greater than p 2 , which has 
exponent given by Q. 

The main result of our paper is the following. 

Theorem 1: Fix a rate R and target distortion D such that 
a 2 / D > x*, where x* « 4.913 is the solution of the equation 



7, log a; 



1 



(I--)- 



Fix h > 4fl 

rix o ^ H-Cl-D/p*) 



where p 2 is determined by ([3}. For every 
positive integer n, let M n = L b n where L n is determined by 
(j3). Then there exists a sequence C = {C n } n =i,2,... of rate 



R sparse regression codes - with code C n defined by an n x 
M n L n design matrix - that attains the optimal error exponent 
for distortion-level D given by |5]). 

Remark: The minimum value of b specified by the the- 
orem enables us to construct SPARCs with the optimal 
error exponent. The proof also shows that we can con- 
struct SPARCs which achieve the rate-distortion function for 
b > R-f^jj/psy with probability of error that decays sub- 
exponentially in n when b is less than AR/(R — (1 — D/p 2 )). 

IV. Proof of TheoremQ] 

Due to space constraints, we omit some details in the proof 
which will be included in a longer version of this paper. Given 
rate R > R*{D), let p 2 be determined by (|6). For each 
a 2 < p 2 , we will show that there exists a family of SPARCs 
that achieves the error exponent | — 1 — log ^2 J , thereby 
proving the theorem. 

Code Construction: For each block length n, pick L as 
specified by |3]l and M = L b . Construct an n x ML design 
matrix A with entries drawn i.i.d Normal(0, ° ~ ). The 
codebook consists of all the vectors A/3, where (3 G Bm,l- 

Encoding and Decoding: If the source sequence S is such 
that I SI 2 > a 2 , then the encoder declares error. Else, it finds 



^ = 9(8) 



argmm 

P£Bm,l 



A/3|| 



The decoder receives /3 and reconstructs S = A/3. 

Error Analysis: Denoting the probability of error for this 
random code by P e , n < we have 

P e ,n < 1 • P(\S\ 2 >a 2 )+ / P(£(S) I |S| 2 - z 2 )dv(z 2 ) 

Jo 

z 2 ). 



< P(\Sr > a 



max P(£-(S)||S| 2 



where £(S) is the event that the minimum of |S — A/3| 2 over 
/3 e Bm,l is greater than D, and z/(|S| 2 ) is the distribution 
of the random variable |S| 2 . The asymptotic behavior of the 
first term above is straightforward to analyze and is given by 
the following lemma, obtained through a direct application of 
Cramer's large-deviation theorem fT6) . 
Lemma 1: 



lim --logP(|S| 2 >a 2 ) = i 

n— >oo n I 



- 1 - log 



The rest of the proof is devoted to bounding the second term 
in 0. Recall that 

P(£(S)||S| 2 = z 2 ) 



= -P(l s 00 - s l > D, i = l, 



nR 



|S| 2 = z 2 ) 



(8) 



where S(i) is the zth codeword in the sparse regression 
codebook. We now define indicator random variables t/j(S) 



for i — 1, . . . , e nR as follows: 



Ui(S) 



_ / 1 if |S(t) - S| 2 < D, 
otherwise. 



(9) 



From ([HJ, it is seen that 



P(£(S)\\S\ 2 = z 2 ) = P V(/,(S)=0 



ISI" = z 2 



(10) 

For a fixed S, the f/j(S)'s are dependent. Suppose that the 
codewords S(i),S(j) respectively correspond to the binary 
vectors /3(i),/3(j) € Bm,l- Recall that each vector in Bm,l 
is uniquely defined by the position of the 1 in each of the L 
sections. If /3(i) and f3(j) overlap in r of their '1 positions', 
then the column sums forming codewords S(i) and S(j) will 
share r common terms. 

For each codeword S(i), there are ( )(M — l) L ~ r other 
codewords which share exactly r common terms with S(i), for 
0<r<£ — 1. In particular, there are (M — 1) L codewords 
that are pairwise independent of S(i). We now obtain an upper 
bound for the probability in ( fTO) using Suen's correlation 
inequality |4|. First, some definitions. 

Definition 2 (Dependency Graphs (^j): Let {Ui}i e x be a 
family of random variables (defined on a common probability 
space). A dependency graph for {Ui} is any graph T with 
vertex set V(r) = 1 whose set of edges satisfies the following 
property: if A and B are two disjoint subsets of I such that 
there are no edges with one vertex in A and the other in B, 
then the families {Ui}i^A and {J7i}i£s are independent. 

Fact 2: |4j Example 1.5, p. 11] Suppose {Y a } a ^ is a 
family of independent random variables, and each Ui,i € I 
is a function of the variables {Y a } a& Ai for some subset 
Ai C A. Then the graph with vertex set I and edge set 
{ij : Ai n Aj ^ 0} is a dependency graph for {Ui\i^x- 

Remark 1: The graph T with vertex set V(r) = 
{!,..., e nR } and edge set e(T) given by 



(7) {ij : i ^ j and S(i), S(j) share at least one common term} 



is a dependency graph for the family {Ui(S)}fl 1 , for each 
fixed S. This follows from Fact [2] by recognizing that each 
Ui is a function of a subset of the columns of the matrix A 
and the columns of A are picked independently in the code 
construction. 

Fact 3 (Suen's Inequality 0y): Let Ui ~ Bern(pi),i G T, 
be a finite family of Bernoulli random variables having a 
dependency graph V. Write i ~ j if is an edge in V. Define 



Then 



J7j = < cxp — min 



2' 65' 8A 



We now apply this inequality with the dependency graph 
specified in Remark [T] to compute an upper bound for ( fTO) . 

First term A/2: Since each codeword is the sum of L 
columns of A whose entries are i.i.d Normal(0, a 2 — D), 



E(£/j(S)) does not depend on i. For any fixed S with |S| 2 = where -Ey(r) is the event that S(i),S(j) have exactly r 
z 2 , we have common terms. We have 

P {Ui(S) = 1, Uj(S) = 1 I E^r), |S| 2 = a 2 ) 
A =^E(Di(S)) = e^PiU^S) = 1 | |S| 2 = z 2 ). (11) = P (\S(i) - S| 2 < D, \S{i) -S\ 2 <D\ E l3 (r), |S| 2 = a 2 



( 1 " 1 " 

Using the strong version of Cramer's large-deviation theorem ^" ( ^ ^Z^' 8 ^ — a ) — ^ ^^('S'fcC?) — a ) — 
by Bahadur and Rao 1 17), we can obtain the following lemma. 



n ' — ' n 

k = l k=l 



(16) 



Lemma 2: For all sufficiently large n and z 2 6 (0, a 2 ), where the third equality is due to the fact that (S(i),S(j)) 

has the same joint distribution as (OS(i), OS(j)) for any 
P(E7i(S) = 1 | |S| 2 = z 2 ) > P(t/i(S) = 1 | |S| 2 = a 2 ) orthogonal (rotation) matrix O. The (S k (i), S k (j)) pairs are 
1 _^io g ^ -"(^"St+t+i) i-i- ' across k, and each is jointly Gaussian with zero-mean 

— K\fn e £ vector and covariance matrix 

1 a 
. a 1 

We thus have a lower bound on A for sufficiently large n. 

Second term X/6: Due to the symmetry of the code con- when S(i),S(j) share r common terms. Using a two 



for some constant n > 0. K r = (a 2 — D 



r 

where a = — (17) 



struction, 

5^ max VP(M = 1 I ISI 2 = z 



dimensional Chernoff bound, we have Vit, t < 

fc=i 



\ ^— ' n / — ' n 

\fe=i fc=i 

£ P (Efc(S) = 1 | |S| 2 = z 2 ) Vi G {1, . . . , e««} * lo « E (e^-^W-J 1 ) - (« + t)Z> = -<?„(«, t) 



(18) 



L-i where 

= EL (M-l)^-P(C/x(S) = l||S| 2 = , 2 ) a 2 ( M + t-2 7 Ml- a 2 )) 

r=1 W C a (u,t) = (u + t)D i_ 2 7 2 M (l-a 2 ) 

= {M L -1-(M- If) P (C/ l( S) = 1 | |S| 2 = z 2 ) . 2aW jug + 1-2^1(1 -a 2 )) 2 

(12) (1 - 2 7 2 u(l - a 2 ))(l - 2 7 2 (u + t) + 4 7 4 ui(l - a 2 )) 
Using this together with the expression for A in ([IT}, we have + I i gh _ 2 7 2 (u + t) + 4 7 4 ui(l - a 2 )) 

A _ M L 1 " (19) 

5 ~ M L - 1 - (M - 1) L ~ J _ L -6L _ [ (1 _ ^ with 7 2 ^ a 2 - D. 

(13) Using ( fT8l l in (jT6j and then in ( fT5| , and using Lemma [2] to 
bound A 2 , we obtain 

where we have used M = L b . Since (1 — L~ b ) L e~ l , 



\2 exp 

we can show using a Taylor expansion that for L sufficiently _ > 



2n(R- ±log^ - 7n ) 



large " A e^£r=i $ (M - i)£-r e -nc„(u,t) 

where 



(20) 



A ^ 1 



^ > L -( b -i)_ i L - 2 (6-i) +o(£ -2( b -i)) ^ L ■ ( 14 > 7n = logn/2n + «/n. (21) 

In the sequel, we will use k to denote a generic positive 
TTjz'ni Term A 2 /A: We lower bound A 2 /A by obtaining a constant whose exact value is not needed. Using Al to denote 
lower bound for A 2 using Lemma|2] and an upper bound for the set {1/L,2/L, ... , {L—l)/L}, a to denote £, and noting 
the denominator A as followsO] m at M L = e nR we have 

A 2 Af L exp[-7^1og( a 2 /J)+2 7 „)] 
A =9EE E ( K ( S )^( S )ll S l 2 = a2 )= A > V , ( r i )(M-l)i(i-«)e-»o.M 

^ ~^ — ^aG^4i \LaJ y > (7.7 \ 

nRL -T/ r 7 . ex P [-n(log( a 2 / J D)+2 7 „)] 



2 E U J ( M - 1) L "^(^(S) = ^(S) = l|J5y(r), |S| 2 = a 2 ) (L - 1) max Qe ^ (/J A/-^ e -»c„( u ,t) ' 

(15) Substituting M = L b and taking logarithms, we get 
A 2 f ( L 

'Here we directly lower bound on A 2 /A for |S| 2 = a 2 . Formally, a lo S ^ > ™ j ~ l)logi - log 

*/pr hnnnH r»n \^ /A fan hp nhtmnprl hgitict cimilnr sfpns frtr I.Ql^ — fr\r *• 



VLay (23) 



lower bound on A /A can be obtained using similar steps for \S\ = z for 
z 2 6 (0, a 2 ), and it can be shown to be decreasing in z 2 . — n(log(a 2 / D) + 2 7n — C a (u, t)) } 



Dividing through by L log L and using the relation |3]l as well 
as the definition ( pi) for j n , we get 



log(LlogL) log 



(la 



LlogL LlogL LlogL I 

(24) 

We need the right side of the above to be positive since we 
want A 2 /A to grow with L. For this, we need: 

2 , lo S (D , K+log log L 

b >- L rf « i f l °fm . Vae^ L . (25) 

, t)-log(a 2 /.D) 
" " r ft 

Further, we need the the denominator of ( [25] ) to be positive. 
Using u = t = — 2 D(\+a) f° r C a (u,i), we can show that 
( p5| > implies the following simplified condition for sufficiently 
large L: 

3R 



b > b n 



(26) 



R - (1 - D/a 2 ) ' 

When b > b m i n , the right side of ([24]) will be strictly positive 
for large enough L. Since a 2 is any number less than p 2 where 
R = | log ^j-, the condition for the denominator to be positive 
is 

This is satisfied whenever a 2 /D > x* as required by the 
theorem. Thus for large enough L, d24]i becomes 



(28) 



!2i^)>' (t - 6mi „)(i-i(i-5 

L log L L \ R a 2 



Therefore for sufficiently large L, 



A 2 
A 



> 



£(&-&mi»)(l-£(l-g))_ 



(29) 



Combining the bounds obtained above for each of the three 
terms, we have for sufficiently large n, 



i n 



P(J2 Ui(8) = 0) < e -~{TuT 2 ,T 3 } 
i=l 



(30) 



T 2 > L 



b-1 



(31) 



where 

T 3 > L (b - b """ )(1 -s(i-5». 
Using this in Q, we obtain 

P e ,„ < P(|S| 2 > a 2 ) + max P{£ (S) | |S| 2 = z 2 ) 

z 2 e(o,a 2 ) P2) 

< e~" T ° + e -nmin{Ti,T 2 ,T 3 } 

where T = — 1 — log from Lemma [l] Since i? = 
^log^j, T\ grows exponentially in n for all a 2 < p 2 nT 2 / n 
When b > 2, T 2 = L^ 1 grows faster than n = bL\ogL/R. 
For 

(b-b mm ) (l - — (1 - D/a 2 ) j > 1, 



T 3 also grows faster than n. This corresponds to the minimum 
value of b specified in the statement of the theorem. Therefore, 
under this condition, the probability of error for large n is 
dominated by the first term in (|32]i. This completes the proof. 



V. Conclusion 

We have studied a new ensemble of codes for Gaussian 
source coding, constructed using the framework of sparse 
linear regression. The codewords are structured linear combi- 
nations of elements of a dictionary; the size of the dictionary is 
a low-order polynomial in the block length. We showed that 
this ensemble achieves the optimal error exponent with ML 
encoding for all distortions below -f^i, or equivalently for 
rates higher than 1.15 bits per source sample. This value may 
be an artifact of some looseness in our bounding techniques, 
especially in analyzing the A 2 /A term of Suen's inequality. 
We also expect that the minimum value of b required by the 
theorem can also be tightened by using a tighter large deviation 
bound for A. The final goal is to develop computationally 
feasible encoding algorithms that rapidly approach the rate- 
distortion bound with growing block length. 

References 

[1] A. Barron and A. Joseph, "Least squares superposition codes of mod- 
erate dictionary size are reliable at rates up to capacity," To Appear in 
IEEE Trans. Inf. Theory. Also in Proc. 2010 IEEE ISIT. 

[2] A. Barron and A. Joseph, "Toward fast reliable communication at rates 
near capacity with Gaussian noise," in 2010 IEEE ISIT. Also Yale Dept. 
of Stat. Technical Report, 2011. 

[3] I. Kontoyiannis, K. Rad, and S. Gitzenis, "Sparse superposition codes 
for gaussian vector quantization," in 2010 IEEE Inf. Theory Workshop, 
p. 1, Jan. 2010. 

[4] S. Janson, Random Graphs. Wiley, 2000. 

[5] T. M. Cover and J. A. Thomas, Elements of Information Theory. John 

Wiley and Sons, Inc., 2001. 
[6] M. Eyuboglu and J. Forney, G.D., "Lattice and trellis quantization with 

lattice- and trellis-bounded codebooks-high-rate theory for memoryless 

sources," IEEE Trans. Inf. Theory, vol. 39, pp. 46 -59, Jan 1993. 
[7] R. Zamir, S. Shamai, and U. Erez, "Nested linear/lattice codes for 

structured multiterminal binning," IEEE Trans. Inf. Theory, vol. 48, 

pp. 1250 -1276, June 2002. 
[8] A. Gupta, S. Verdu, and T. Weissman, "Rate-distortion in near-linear 

time," in 2008 IEEE Int. Symp. on Inf. Theory, pp. 847 -851. 
[9] I. Kontoyiannis and C. Gioran, "Efficient random codebooks and 

databases for lossy compression in near-linear time," in IEEE Inf. Theory 

Workshop on Networking and Inf. Theory, pp. 236 -240, June 2009. 
[10] S. Jalali and T. Weissman, "Rate-distortion via Markov Chain Monte 

Carlo," in 2010 IEEE Int. Symp. on Inf. Theory. 
[11] S. Korada and R. Urbanke, "Polar codes are optimal for lossy source 

coding," IEEE Trans. Inf. Theory, vol. 56, pp. 1751 -1768, April 2010. 
[12] D. Sakrison, "A geometric treatment of the source encoding of a 

Gaussian random variable," IEEE Trans. Inf. Theory, vol. 14, pp. 481 - 

486, May 1968. 

[13] V. Kostina and S. Verdii, "Fixed-length lossy compression in the 
finite blocklength regime: Gaussian source," in 2011 IEEE Inf. Theory 
Workshop. 

[14] K. Marton, "Error exponent for source coding with a fidelity criterion," 
IEEE Trans. Inf. Theory, vol. 20, pp. 197 - 199, Mar 1974. 

[15] S. Ihara and M. Kubo, "Error exponent for coding of memoryless 
Gaussian sources with a fidelity criterion," 1EICE Trans. Fundamentals, 
vol. E83-A, p. 18911897, Oct. 2000. 

[16] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applica- 
tions. Springer, 1998. 

[17] R. R. Bahadur and R. R. Rao, "On deviations of the sample mean," 77ie 
Annals of Mathematical Statistics, vol. 31, no. 4, 1960. 



